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Abstract. Verifying the Second-Order Sufficient Condition (SOSC), thus ensuring a sta- 
tionary point locally minimizes a given objective function (subject to certain constraints), 
is an essential component of non-convex computational optimization and equilibrium pro- 
gramming. This article proposes three new "Hessian-free" tests of the SOSC that can be 
implemented efficiently with gradient evaluations alone and reveal feasible directions of 
negative curvature when the SOSC fails. The Bordered Hessian Test and a Matrix Iner- 
tia test, two classical tests of the SOSC, require explicit knowledge of the Hessian of the 
Lagrangian and do not reveal feasible directions of negative curvature should the SOSC 
fail. Computational comparisons of the new methods with classical tests demonstrate the 
relative efficiency of these new algorithms and the need for careful study of false negatives 
resulting from accumulation of round-off errors. 

1. INTRODUCTION 

Verifying the Second-Order Sufficient Condition (SOSC) to certify local optimality of 
points computed by optimization software is an important but underdeveloped component 
of computational optimization. Existing optimization solvers compute points satisfying 
a First-Order or Second-Order Necessary Condition (FONC/SONC), typically without 
checking the corresponding SOSC. As a result, there remain cases in which computed 
points are not optimizers such as, for example, Example 1. 

While verifying the SOSC is important for non-convex optimization, verifying the SOSC 
is essential for equilibrium models increasingly being employed in Economics, Operations 
Research, and Engineering. While much of the theory of computing equilibria relies on 
assumptions of convexity to ensure the first-order conditions imply optimality rather than 
just stationarity [26], examples of non-convex games are rapidly appearing in important 
applications. So long as equilibrium programming methods for general, non-convex games 
are restricted to solving the combined first-order conditions, algorithms can compute simul- 
taneous first-order points but cannot distinguish equilibria from other types of first-order 
points; see, e.g., Example 2. Verifying the SOSC is thus fundamental to properly computing 
equilibria. 

At least two ways to test the SOSC have been known for some time. The classical 
"Bordered Hessian Test" (BHT) [41, 58, 69, 57] can, in principle, be used to verify or reject 
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the SOSC at points computed by optimization software. Computationally implementing 
the BHT relies on a set of nested LU factorizations that can be efficiently taken with 
LU factorization updating. Determining the inertia of a "KKT matrix" [46, 68] is another 
"classical" way to verify or reject the SOSC. Efficient implementations of this "Inertia test" 
are available in the form of existing factorization packages that can compute matrix inertia, 
and may be easily integrated into SQP solvers for constrained optimization problems. 

Both of these "classical" tests require an explicit representation of the Hessian of the 
Lagrangian, which may not be available. Popular optimization solvers including MINOS, 
LANCELOT, SNOPT, KNITRO, and matlab can utilize gradient evaluations alone through quasi- 
Newton updates, finite-differences, or automatic differentiation [68]. The Hessian of the 
Lagrangian will thus not be available for verifying the SOSC when gradient evaluations 
alone are used. "Hessian-free" algorithms requiring only Hessian-vector products [68] could 
seamlessly integrate SOSC checks with optimization and equilibrium solvers that do not 
require users to provide formulas for the second derivatives of the objective or the con- 
straint functions. Such algorithms can fully exploit sparsity patterns in the objective and 
constraint functions, as well as be implemented using directional finite differences to ap- 
proximate Hessian- vector products. 

This article presents three new Hessian-free algorithms for verifying or rejecting the 
SOSC at first-order points in smooth equality-constrained optimization or equilibrium 
problems. The first algorithm is based on Cholesky factorization, the most efficient and 
stable method for testing Hessian positive-definiteness without constraints [44, 49]. The 
second algorithm is based on an "oblique" Gram-Schmidt orthogonalization, and the third 
algorithm is a modification of the Projected Conjugate Gradient algorithm developed for 
constrained quadratic programming [47, 68]. By being Hessian-free, these algorithms can 
take full advantage of sparsity patterns in the Hessian of the Lagrangian when it is known 
and may significantly reduce the number of gradient evaluations necessary to verify or 
reject the SOSC when the Hessian is not available at all. 

Another important feature of an algorithm to verify the SOSC is how easily a feasible 
direction of negative curvature can be computed should the SOSC fail. Certain optimiza- 
tion algorithms use feasible directions of negative curvature to promote global convergence 
to second-order necessary points; see, e.g., [18]. Recovery of a feasible direction of nega- 
tive curvature when an the SOSC fails enables "warm restarts" of optimization algorithms 
that may not already take advantage of second-order information. The classical BHT and 
Inertia tests do not provide an obvious path to computing such directions when the SOSC 
fails. In contrast, the Hessian-free algorithms proposed here make computation of such 
directions straightforward when the SOSC fails. Indeed, two of the Hessian-free algorithms 
presented here fail precisely by finding such directions. 

Finally, useful algorithms for verifying the SOSC should not return false-positives or 
false-negatives due to the accumulation of round-off errors. Unfortunately the numerical 
accuracy of verifying the SOSC with any approach has not yet been addressed. Section 
5 provides computational comparisons of the different algorithms including an example 
that illustrates significant potential for erroneous results due to round-off errors when the 
constraint gradients are very nearly linearly dependent. In particular, no method can be 



HESSIAN-FREE METHODS FOR CHECKING THE SECOND-ORDER SUFFICIENT CONDITIONS IN EQUALITY-CONSTRAINED OP 



considered accurate for certain problems with very nearly linearly independent constraint 
gradients, even with relatively few variables and constraints. 

2. OPTIMIZATION AND EQUILIBRIUM PROBLEMS 

2.1. Optimization Problems. This article considers the equality-constrained, continu- 
ous variable optimization problem 

minimize /(x) 
(1) with respect to x G ~R N 

subject to c(x) = 

where / : R N —?■ M., c : M. N — >• M M , and N, M G N, M < N. The objective (/) and 
constraints (c) are nonlinear and twice continuously differentiable functions of N variables 
x = (x\, . . . , xjv). The component functions of c are denoted by a, for i G {1, . . . , M}. 

Theorem 1. (Optimality Conditions [68, Theorems 12.1, 12.5, 12. 6]) Consider Problem 
(1), and assume the Linear Independence Constraint Qualification (LICQ) is satisfied [68, 
Def. 12.4]. Define the "Lagrangian" £(x, A) = /(x) — A T c(x) ; and let A(x) = Dc(x) G 
M. MxN denote the Jacobian matrix of the constraint function evaluated afxG R . 
FONC: If x* G M. N is a local solution to Problem (1), then there exists some A* G M M 
such that V£(x*, A*) = 0. That is, V/(x*) = YaU A * Vc i( x *) and c ( x *) = 
SONC: Moreover, w T H(x*, A*) w > for all w G C(x*) = { h : A(x*)h = } where 
H(x, A) denotes the Hessian of the Lagrangian (or simply "Hessian"): 

M 

H(x, A) = V^£(x, A) = V^/(x) - ^ A m V^q(x). 

i=l 

SOSC: On i/ie oi/ier /land, suppose x* G 1^ and A* G M M satisfy the FONC and 
w T H(x*,A*) w > for all w G C(x*). T/ien x* is an isolated local solution to Prob- 
lem (1). 

Note that this SOSC also applies to inequality constrained optimization problems at 
strictly complementary stationary points [68, Def. 12.5], and as the continuous part of the 
optimality conditions for mixed-integer nonlinear optimization problems. The remainder 
of this article assumes that x* and A* satisfy the FONC and denotes H(x*,A*), A(x*), 
and C(x*) by simply H, A, and C, respectively. Below the SOSC is denoted compactly 
using the symbol H >~c 0. 

Many existing codes for solving problem (1) solve a variant of the FONC without verify- 
ing the SOSC at computed points [65]. Sequential Quadratic Programming (SQP) methods 
[68, Chapter 18] solve a sequence of local quadratic model problems that, in the equality- 
constrained case, corresponds to applying Newton's method to solve the FONC. SQP 
methods are currently implemented in NPSOL [39], SNOPT [37], filterSQP [30, 29, 31], 
KNITRO [13, 15], and matlab [48, 73, 72, 40, 28]. Augmented Lagrangian methods [68, 
Chapter 17], [7] "penalize" the Lagrangian with a measure of the constraint violation and 
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solve the FONC for a sequence of penalized problems with a variant of Newton's method. 
This method is currently implemented in the MINOS [62, 63, 64] and LANCELOT [17, 16] 
codes. Like Augmented Lagrangian methods, Interior- Point methods [68, Chapter 19] for 
inequality-constrained problems solve the FONC for a sequence of approximate problems 
using a variant of Newton's method. KNITRO [15], LOQO [89, 88], IPOPT [90], and matlab 
[12, 14] currently contain implementations of interior-point methods. 

Obtaining a solution to the FONC is not sufficient to declare the computed point a local 
solution to (1), as shown in Example 1 below. In practice, sufficient decrease conditions 
on a merit function (or a filter mechanism) bias existing solvers towards computing con- 
strained minimizers of / [20, 68, 78]. Indeed, this bias towards optimizers is certainly one 
reason separate codes for verifying the SOSC do not currently exist. Sufficient decrease 
conditions certainly rule out some types of convergence: for example, these conditions rule 
out converge to local constrained maximizers of /. However, algorithms with sufficient 
decrease conditions can still converge to saddle points, as in Example 1. 

Example 1. Consider minimizing f(x) = x 3 over M. x* = satisfies the FONC (f'(0) = 
0) and the SONC (f"(0) = 0), but does not satisfy the SOSC Indeed, f is not locally 
minimized at x* = 0, or at any other finite x. Applying SQP [68, Chapter 18.1] to this 
problem starting at any xo > generates the sequence x n = x n -\/2 = xo/2 n — > as 
n — > oo. Moreover this sequence satisfies the Armijo Condition [68, Eqn. 3.4], a popular 
sufficient decrease condition. ■ 

There also exist "second-order" algorithms that converge only to points at which the 
Hessian is positive semi-definite [18]. Such algorithms make use of feasible directions of 
negative curvature— vectors d 6 C satisfying d T Hd < 0— to promote converge to SONC 
points. Any such algorithm must (periodically) compute a direction of negative curvature 
and thus, by definition, contains a check of the SOSC: should no direction be found, the 
SOSC must hold. While this is certainly sufficient when the Hessian is known explicitly, 
many practical applications do not have such knowledge. Implementations of second- 
order algorithms that rely on quasi-Newton approximations to the Hessian only determine 
whether there is a direction of negative curvature for the approximation, rather than the 
true Hessian, and thus cannot by themselves verify the SOSC. Many other large-scale 
Hessian-free codes apply Conjugate-Gradient (CG) type iterations to solve constrained 
quadratic subproblems [47]. Section 4.3 below, however, demonstrates that convergence of 
CG methods alone is insufficient to verify the SOSC and thus it is conceivable that CG 
methods could miss indefiniteness in H (over C) in some exceptional circumstances. A 
post-convergence verification of the SOSC at computed points appears to be required for 
proper application of Hessian-free methods for large-scale optimization. 



2.2. Equilibrium Problems. Verifying the SOSC is also currently vital for solving non- 
linear, non-convex equality constrained (generalized Nash) equilibrium problems; that is, 
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collections of K G {2, 3, . . . 



} coupled optimization problems: 



(2) 



minimize /fc(xi, . . . ,xjf) 
with respect to G ~R Nk 

subject to Cfc(xi, . . . , xx) = 



where f k : R N -»• K, c fc : 1^ — >■ R M *, N k , M k G N, M fc < JV fc , a nd JV = JVi + • • • + N K 
for all fc G {1, ... , K}. A {local) equilibrium is a point (xj, . . . , x^) G such that x£ G 
R^ fc (locally) solves Problem (2) for all k. Originating in game theory, such equilibrium 
models have been used by Economists and Operations Researchers to study electric power 
[50, 51, 52, 24] and other energy sectors [35, 34], new and used vehicles [5, 42, 43, 84, 
71, 55, 6, 3, 53], entertainment goods [45], and food services [8, 66, 67, 83, 85]. Recent 
"market-systems" research in engineering design is also applying the equilibrium framework 
[59, 82, 79, 80, 81, 33]. See [26] for a review of similar applications and economic history. 

First and second order optimality conditions can be employed in the computation of equi- 
libria. In particular, a FONC for a (local) equilibrium follows from combining the FONC 
for each underlying optimization problem, resulting in a single nonlinear system (e.g., [60]) 
or, more generally, Nonlinear Complementarity Problem (NCP) (e.g., [27, 51]); see [26]. 
"Simultaneously stationary" points that solve combined FONC can often be computed us- 
ing standard methods for nonlinear systems or NCPs such as trust-region Newton methods 
[20, 18], non-smooth Newton methods [74, 21, 61], or semi-smooth methods [61]; see, e.g. 
[2, 51, 60]. However the sufficient decrease conditions that enforce convergence to mini- 
mizers or saddle points in optimization problems now apply to a residual norm instead of 
an objective function, and thus cannot preferentially select equilibria over non-equilibrium 
stationary points. 

Example 2. Morrow & Skerlos [60, Example 10] compute equilibrium prices for a two- 
firm market with heterogeneous consumers. Both firms set the price for a single "branded" 
product whose only non-price attribute is "brand". There are three types of consumers, 
two of which are brand-loyal and the other is brand-indifferent. Demand within each type 
is modeled using a Logit model [86, Chapter 3]. The resulting optimal pricing problem is 
an unconstrained nonlinear optimization problem with a non-convex, multi-modal objec- 
tive. The combined FONC for equilibrium has nine solutions, with only four of these nine 
first-order points locally maximizing both firms' profits. Newton's method applied to the 
combined FONC cannot distinguish between any of these nine points; any of the five spu- 
rious "solutions" could be mistaken for equilibria if the SOSC were not verified. Again, no 
general, globally-convergent method for computing only equilibria in this type of non-convex 
game is currently known. ■ 

Some studies compute equilibria using "sequential optimization", "tattonement" , or 
"Jacobi/Gauss-Seidel" methods [59, 51]; see the discussion of "practitioners methods" in 
[26] for algorithmic details. In general, sequential optimization should enable some "fil- 
tering" of simultaneously stationary points, as the use of optimization algorithms would 
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avoid convergence to simultaneously stationary points that are minimizers of some ob- 
jectives and maximizers of others. However, there are no existing results guaranteeing 
any sort of convergence behavior from sequential optimization methods in either convex 
or non-convex equilibrium problems [26]. Because there do exist algorithms proven to 
converge to solutions of the combined FONC, computing equilibria through solving the 
combined FONC is currently theoretically preferable to sequential optimization methods. 
Furthermore, sequential optimization is likely to be efficient only when the optimization 
problems are weakly coupled; much effort could be wasted when the optimizers for the 
coupled problems strongly depend on one another. 

While strong methods for computing equilibrium points in games with convex objectives 
and feasible sets exist and are preferable to solving the FONC alone, the alternatives to 
solving the combined first-order conditions when the players' objectives and feasible sets 
are non-convex are currently limited. Until methods guaranteed to compute equilibria in 
non-convex games are developed, general equilibrium programming must be undertaken 
with checks of the SOSC. 

2.3. Benefits of Hessian- Free Algorithms. A "Hessian-free" algorithm for checking the 
SOSC will require only matrix-vector products with H, rather than requiring H explicitly 
[68, pg. 170]. Hessian- free algorithms for checking the SOSC have at least two major 
advantages over algorithms that require the Hessians explicitly. 

First, multiplying by H can be more efficient than working directly with the elements 
of H, even when H is known [87, pg. 244]. For example: if H G M> NxN is diagonal, 
then Hx can be computed in N flops rather than the 2N 2 flops required for arbitrary 
H G R NxN . For very large and sparse H the efficiency gained by algorithms that require 
only multiplications by H can reduce computational burden by an order of magnitude, 
without adding the significant overhead required by sparse factorization methods to track 
entries and maintain sparsity. The benefits of this property is well-known and lauded in 
numerical linear algebra. 

Second, the second-order derivatives of the objective and constraints required to explic- 
itly compute H can be challenging to derive, difficult to program, and computationally 
intensive to implement for complex optimization and equilibrium problems [65, 54, 68]. 
Matrix-vector products Hs, however, can be obtained with the gradient of the Lagrangian 
alone using finite-difference approximations [10, 70, 68]: 



for small a; see [20, 10] to choose effective scales a. In fact, this relationship underlies the 
effectiveness of some Newton-Krylov solvers for very large and complex nonlinear systems 



Of course, an approximation to H itself could be constructed using finite differences with 
N gradient evaluations. Indeed, this complete approximation is required to use the BHT 
and Inertia tests when the second derivatives are not explicitly provided; quasi-Newton 
approximations such as the BFGS approximation cannot be used. The new Hessian-free 




[10, 70]. 
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algorithms presented below, however, require at most L < N evaluations of the gradient 
of the Lagrangian. For highly constrained problems (L <C N) this represents a significant 
decrease in function evaluations and thus overall computational burden. 

3. THREE TESTS OF THE SOSC 

Verifying the SOSC for unconstrained problems requires verifying the positive-definiteness 
of the Hessian matrix H. Computationally, this is best accomplished by attempting to 
take a Cholesky factorization of H, a stable, efficient, and symmetry-exploiting algorithm 
[44, 87, 49]. At least three equivalent tests exist for evaluating the "constrained positive- 
definiteness" H >~c of the Hessian of the Lagrangian in Problem (1). 

The most direct test of the SOSC H >-q is to verify the positive-definiteness of a 
reduced L x L Hessian matrix, where L = N — M: 

SOSC Test 1 ((Reduced Matrix Test [57])). H^O if, and only if, W T HW G M. LxL is 
positive-definite for any matrix W G M. NxL whose columns form a basis of C. 

The three algorithms described in Sections 4.1, 4.2, and 4.3 below are implementations 
of this test. 

The remaining two tests involve two (N + M) x (N + M) matrices: 



B 



A 

A T H 



and K 



H A 1 
A 



where in both cases 

e R MxM . B is the well-known "Bordered Hessian" [58, 57, 69]. 
K appears in the FONC for equality-constrained quadratic programs and is thus often 
referred to as a "KKT matrix" [68, Chapter 16]. More generally, K is an example of a 
saddle-point matrix or equilibrium system; for general information see the extensive review 
in [4]. 

The "Bordered Hessian Test" (BHT) is a classical test of H >-q particularly popular 
in economics that uses B: 

SOSC Test 2 ((Bordered Hessian Test [41, 58, 57])). H y c if and only if, the last L 
leading principle minors of B all have sign (-1) M ; specifically, sign(det (B;) ) = (-1) M 
for all i = 1, . . . , L, where Bj is the submatrix of B formed by taking the first 2M + i rows 
and columns. 

Note that the BHT requires L determinant calculations, and thus L LU-factorizations. 
Section 4.4 below outlines an efficient procedure for verifying the BHT using updated LU 
factorization. 

A symmetry-exploiting, single-factorization test based on the KKT matrix K can also 
be derived. The inertia of a matrix is a triple containing the number of positive, negative, 
and zero eigenvalues [44, 68]. Gould [46] proves the following equation: 

Lemma 1. inertia(K) = inertia(W T HW) + (M,M,0) for any basis W of C. 

Because H is positive-definite over C if, and only if, inertia(W T HW) = (L,0,0) — i.e., 
all of (W T HW)'s eigenvalues are positive — Lemma 1 establishes the following test: 
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Table 1. Several methods for verifying the SOSC elaborated on in Sec- 
tion 4. "Hessian-Free" refers to the ability of an algorithm to operate with 
Hessian-vector products, rather than the actual elements of the Hessian. 
"Basis of C?" refers to the requirement that an algorithm start by com- 
puting a basis of C. "Dir. of Neg. Curvature" refers to the ability of an 
algorithm to reveal or compute a feasible direction of negative curvature 
when the SOSC fails. 







Hessian- 


Basis 


Dir. of Neg. 


Test 


Algorithm 


Free? 


of C? 


Curvature 


1 


Alg. 2 (Sec. 4.1) 


Yes 


Yes 


Yes 


1 


Alg. 4 (Sec. 4.2) 


Yes 


Yes 


Yes 


1 


Alg. 5 (Sec. 4.3) 


Yes 


No 


Yes 


2 


Sec. 4.4 


No 


No 


No 


3 


Sec. 4.5 


No 


No 


No 



SOSC Test 3 ((The Inertia Test [46])). H if, and only if, inertia(K) 
(N,M,0). 



inertia(B) 



Section 4.5 describes how this test can be implemented using an inertia- revealing symmetric- 
indefinite factorization to compute the inertia of K [11, 46, 1, 23]. 

4. FIVE ALGORITHMS 

This section derives several algorithms for verifying the SOSC; see Table 1. The focus 
is on three new Hessian-free algorithms for Test 1, (Sections 4.1, 4.2, and 4.3) rather than 
the existing factorization-based BHT and Inertia tests (Section 4.4 and 4.5). Appendix A 
discusses ways to compute a basis W of C, needed for the algorithms described in Sections 
4.1 and 4.2; Algorithm 5 provides a Hessian- free method that does not require computing 
a basis of C. 

4.1. An Implicit, Projected Cholesky Factorization for Test (1). The most direct 
way to verify H >~c given a basis W of C is to attempt to take a Cholesky factorization 
of W T F£W. In the n th step the Cholesky factorization derived in [87, Lecture 23], 

rVi o 

Wl L H n W n:L 



W 1 HW = Li L 



J n-1 



J n-1 



where 



In-l 







fOLr. 



W> +hL H n w n /, 



'a, 








Hi = 

(3) 

and a 



H, 



H n _i — H„_iw n -iw n _ 1 H n _i/a n , 



w^[H n w n for all n S {1, . . . , L}. H >~c if, and only if, a±, . . . , oll > 0. 
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Algorithm 1 ((' 'Classical" Implicit Algorithm 2 (( 'Modified" Implicit 

Cholesky)). Cholosky)). 

1 V <- HW 1 V <- HW 

2 for n= 1, 2 for n= 1, . . . ,L, 

3 for m=l,...,n-l, 3 a w,jv„ 

4 /3 «- vXw n /a m 4 i/ a < 0, /oil, 

5 v„ <- v n - /3v m 5 form = n+l,...,L, 

6 a n «- w^v„ 6 /? <- v^w m /a 

7 i/ a„ < 0, /ail, 7 

Figure 1. "Classical" (Left) and "Modified" (Right) Gram-Schmidt style 
Implicit Cholesky Algorithms. 

Observing that the Cholesky factors do not need to be explicitly computed to compute 
the a numbers determining whether the Cholesky process succeeds or fails, the Cholesky 
process can be reduce to the following: 

Lemma 2. For any n G {1, . . . , L} such that a m = w^v m > for all m < n, a n = wjv n 

where 

n-1 

(4) 



/v w \ 

v n = H n w n = Hw„, - S~] ( m ) \ Tl 

m=l v ' 



Proof. The formula for a n is its definition accepting v n = H n w n ; the second formula for 
v n follows from Eqn. (3) by induction. □ 

Two "implicit" Cholesky algorithms that implement Eqn. (4) are given in Algs. 1 and 
2. While W T HW is not explicitly formed, Algs. 1 and 2 implicitly form the upper (or 
lower) triangle of W T HW. Note that Algs. 1 and 2 only need to compute the products 
Hw n , rather than work with the elements of H explicitly, and is thus Hessian- free. Finally, 
feasible directions of negative curvature can be computed easily should either algorithm 
reject the SOSC: 

Lemma 3. Suppose Algorithm 1 or 2 fails in the n th step, for some n G {1, . . . ,L}, with 
a n < 0. Then d = S1W1 + • • • + s n w n is a feasible direction of negative curvature, where 
s n = 1 and 

_ Sm+l(Vm w m+l) + ; ; ; + Sn(Vm w ») 
[O) s m — 

for all m £ {1, ... ,n— 1}. 

Implementing this formula requires storing the a values computed in Alg. 2 and would 
benefit from storing the triangular array of inner products v^w m+ fc that are computed as 
part of the Implicit Cholesky process, rather than re-computing them to find a direction 
of negative curvature. 
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Algorithm 3 ((" Classical" Diagonaliz 
tion)) . 

1 V <- w 

2 for n = 1, . . . , L, 

3 for m = 1, . . . , n — 1, 

5 v„f- v„ - /3v m 

6 z„ <- Hv„ 

7 a n <r- v^z„ 

8 if u n < 0, /ail, 



Algorithm 4 ((' 'Modified" Diagonaliz 
tion)) . 

1 V <r- W 

2 /or n = 1, . . . , L, 
z Hv„ 

T 



3 
4 
5 
G 
7 



a n <r- v„ z 
*/ a n < 0, /ail, 
/or m = n + I, . . . 

13 <- w^z/a m 



I. 



/3v n 



Figure 2. "Classical" (Left) and "Modified" (Right) Gram-Schmidt style 
Diagonalization algorithms. Note that the vectors zi, . . . ,z^ can overwrite 
wi, . . . , w L . 



Proof. The idea is straightforward: if d = Ws where s solves • • -L^s = e n , then 

d G range(W) = C and d T Hd = a n < 0. Eqn. (5) follows from applying back substitution 
to solve for this s. □ 

4.2. A Diagonalization Method for Test (1). The following facts furnish a different 
version of Test (1): 

Lemma 4. (i) Let the columns of W € R AfxL be any basis for C, let S e R LxL be any 
nonsingular matrix, and set V = WS. W T HW is positive-definite if, and only if, V HV 
is positive-definite, (ii) Let V be any basis of C such that V T HV is diagonal. H >~c if, 
and only if, all o/V T HV's diagonal entries are positive. 



Proof. Claim (i) follows from Sylvester's Law of Inertia and claim (ii) is trivial. 



□ 



Thus, given any basis W of C, the definiteness of H over C can be revealed by finding a 
nonsingular S such that V = WS and V HV is diagonal. An "oblique" Gram-Schmidt 
process makes this possible: 



Lemma 5. Set vi 

(6) 



wi and recursively define 

rt-l 



E 



k=l 



Vj! Hw n 



v^Hvfc / for all k £ {1, 



for any n G {2, . . . , L}, so long as = v^T 
vi, . . . , v n so defined, v^Hvt = for all m, k G {1, . . . , n}, m ^ k. 

Proof. The proof is a straightforward induction. 



. n 



1}. For 



□ 



Again, two algorithms for Test 1 based on Eqn. (6) are provided in Algs. 3 and 4. Note 
that Algs. 3 and 4 require exactly the same linear-algebraic operations as Algs. 1 and 2, 
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but on different quantities. In particular, Algs. 3 and 4 are also Hessian-free. As written, 
Algs. 3 and 4 require an additional N x L matrix of storage for zx, ■ • . , zl, although these 
vectors can be written over wi, . . . ,W£, if the basis of C is not needed after the SOSC 
check. Note, however, no additional computation is required to extract a feasible direction 
of negative curvature from Algs. 3 or 4 when the SOSC fails: 

Lemma 6. Suppose Algorithm 3 or 4 fails in step n < L with a < 0. Then v n is a feasible 
direction of negative curvature. 

Thus, Alg. 3 or 4 is a useful re-organization of Alg. 1 or 2 (respectively) if directions of 
negative curvature are important. 

4.3. Projected Conjugate Gradients for Test (1). The Conjugate Gradient (CG) 
algorithm is a widely-used iterative method for solving symmetric positive-definite linear 
systems [44, 87]. The Projected Conjugate Gradient (PCG) algorithm is an equivalent 
algorithm for solving the FONC for equality-constrained quadratic programming problems 
[47, 68]. Verifying the SOSC with PCG is based on the following converse question: 

Can (P)CG verify that a matrix is (constrained) positive-definite? 

This section describes a Hessian- free approach for checking Test (1) based on existing 
PCG algorithms. For simplicity, the majority of the derivation neglects constraints, and 
considers how CG can be adapted to verify or reject the positive-definiteness of a matrix 
H € ~R NxN . The extension to the constrained case is a straightforward adaptation of this 
discussion and existing PCG methods as described in, e.g. [68, Chapter 16]. CG applied 
to the system Hx = b, b / is denoted by CG(H, b); see [44, pg. 527] or [68, pg. 112] 
for a formal algorithm. Because CG started at xo ^ is equivalent to CG(H,b') started 
at 0, where b' = b — Hxo, it is assumed that CG(H, b) always starts at 0. 

While CG(H,b) converges if H is definite [44, 87, 68], convergence of CG(H, b) alone 
is not an indicator of definiteness: 

Example 3. Let H = diag(l, —1) and b = (1, 2). CG(H, b) converges in two steps, despite 
being applied to an indefinite matrix. However if b = (1,1), then CG(H, b) breaks down 
in the first step. ■ 

This example shows that convergence (or divergence) of CG alone cannot be used as a 
basis for verifying positive-definiteness. 

The foundation for using CG(H,b) as a positive-definiteness check is, in fact, Lemma 
4(h). In exact arithmetic CG(H,b) constructs a set of N' < N linearly independent, El- 
conjugate vectors Pi and evaluates p^Hpj for all i € {1, . . . ,N'} [44, 87]. Verifying that 
p^Hpj > for all i verifies positive-definiteness, by Lemma 4(h), assuming that N' = N. 

Example 4. Again let H = diag(l, —1) and b = (1, 2). CG(H, b) encounters pjHp 2 < 
in the second step, and thus discovers indefiniteness in H. ■ 

Ensuring that p^Hpj > for all i £ {1, . . . , N'} is already a component of some CG 
codes [68]. 



12 



W. ROSS MORROW 



Unfortunately CG(H, b) may converge for some N' < N prior to building a basis of 
M. N and thus cannot always determine whether H is positive-definite. Indeed this "early 
convergence" is the central benefit of CG for solving large linear systems. 

Example 5. Consider again H = diag(l, — 1), and take b = (1,0). CG(H,b) converges 
in a single step (1 = N' < N = 2 ), and cannot identify that H is indefiniteM 

In fact, in exact arithmetic, CG(H, b) always converges with N' < N when b lies in a 
proper invariant subspace of H. (A proper invariant subspace of H is a proper subspace 
W of l w such that Hw £ W for all w 6 W.) The same issue presents in finite-precision 
arithmetic, where CG(H,b) "converges" with N' < N when the residual Hx — b or the 
search directions become "small". 

When CG(H,b) converges with N' < N, the search can be continued consistent with 
Lemma 4(ii) by "restarting" with a new right-hand-side b' that is H-conjugate to all pre- 
vious search directions. The continued search should proceed until (a) positive-definiteness 
has been rejected, (b) the restarted process itself converges, or (c) the remaining dimensions 
of the space have been searched without rejecting positive-definiteness. 

Example 6. To continue the case in Example 5 above choose b' = (0, ±1). b' _L Hb and 
(b') T Hb / < 0; thus the "restarted" CG(H, b') would identify indefiniteness. ■ 

Figure 3 gives a formal algorithm implementing the continued PCG approach— with 
constraints— to verify (or reject) H >-q 0. Algorithm 5 is Hessian-free and, like Algorithm 
4, does not require any additional computation to extract a feasible direction of negative 
curvature: 

Lemma 7. Suppose Algorithm 5 fails in step j < L with rj < 0. Then pj is a feasible 
direction of negative curvature. 

Algorithm 5, being modeled on existing PCG algorithms [47, 68], allows flexibility 
in choosing an operator proj c : M. N — > C that orthogonally projects vectors from 
onto C C R . This projection can always be done using a basis W of C by setting 
projg(r) = W(W T W) _1 W T r, a formula that is particularly simple when W is orthonor- 
mal. However, projection onto C can also be accomplished without computing a basis for 
C [47]. This is an important distinguishing feature of Algorithm 5 relative to Algorithms 2 
and 4: In principle, PCG provides a Hessian-free way to verify H >-q if it is impractical 
to project onto C using a basis of C computed from the constraint gradients. 

Gould et al [47] study two projectors in the traditional PCG algorithm (in the context 
of finite-precision arithmetic). The first sets proj c r = r — A T v where AA T v = Ar can be 
solved using a Cholesky factorization of A A T . A QR factorization or SVD of A T could also 
be used to solve min vgK Af ||A T v — r||2, the equivalent least-squares problem defining v. The 
second method considered in [47] solves an "augmented system" with symmetric-indefinite 
factorization; see [47, 68, 32]. Any necessary factorizations need only be taken once, prior to 
executing Algorithm 5. Continuation, however, requires projecting onto the intersection of 
C with the image under H of the previous search directions. Computationally, this amounts 
to appending certain rows to A or, equivalently, columns to A T . Using Householder QR 
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Algorithm 5 ((Continued Projected Conjugate Gradients)). 

1 choose bgC 

2 i <- 

3 while i < L, 

4 pcg{b,C,i) 

5 if peg (b ,C , i) failed, exit 

6 else if peg (b ,C , i) converged, 

7 C<-C fl span{q fc : k = i + 1, . . . , j} 

8 choose b £ C 

9 i <- j + 1 

Algorithm 6 ((p rojected Conjugate Gradient Step)). 

pcg(b,C,z) 



1 


r <— b, u) <— r T r, r r/y/uj, p ' r 


2 


/or j = i + 1, . . . , L 


3 


t <- W, q 3 <- Hp, jj ^ p T q i; 


4 


if V < 0, f ail 


5 


r^r {r/r))vj, s <- proj c (r), 


13 


*/ l w l ^ converged 


14 


p <- s + (w/r)p 



FIGURE 3. Formal algorithm for the continued PCG method for verifying 
(or rejecting) H 0. 

factorization of A T provides an efficient and stable way to applying and updating this 
projector. 

4.4. Updated LU Factorizations for the BHT, Test (2). The BHT requires com- 
puting the sign of L determinants, each typically computed using LU factorization; see 
[44, pg. 97] or [49, pg. 279]. Explicitly forming the L LU factorizations to compute the 
leading principal minor determinants for the BHT, however, requires an unacceptable and 
unnecessary amount of work asymptotically proportional to L(N + M) 3 . 

An efficient LU updating scheme for computing the associated determinant signs follows 
from a recursive definition of the leading principle submatrices of B. The matrices Bj, 
i = 2, . . . , L (defined in Test 2) satisfy the recursion 

b . G R 2M+i-l 

for some 

7i e K 



B, 



B, 



bi 

H 



11 



W. ROSS MORROW 



An LU factorization of Bj can then be constructed by applying the pivots and row elimi- 
nations from earlier LU factorizations to bj and then zeroing out bj with 2M + i-l new 
row eliminations [44, Section 3.2]. 

For brevity the full LU updating process is not described here; deriving a formal state- 
ment of this update is straightforward, if a bit tedious. The first LU factorization can be 
specially designed to account for the leading M x M submatrix of zeros in B and stan- 
dard pivoting strategies can be used to enhance stability. Existing codes such as LAPACK 
[56], LUSOL [36, 75] PARDISO [76], or HSL's LA15 [19] can be exploited to take the first 
LU factorization as well as, in some cases, implement the factorization updates. While 
computing determinants can be inaccurate due to numerical over- and under- flow [49, pg. 
279], computing the sign of the determinant is exact for the computed LU factorization. 
In any case, the BHT ultimately requires factoring the L matrices Bj and may thus be 
inappropriate for large scale problems without significant sparsity. Furthermore, the BHT 
does not provide an obvious route to computing a direction of negative curvature should 
the SOSC fail. 

4.5. Block Symmetric-Indefinite (LDL) Factorization for the Inertia Test (3). 

Implementing the Inertia Test (3) is, in principle, rather straightforward. The inertia of 
K can be computed using a stable symmetric-indefinite block "LDL" factorization [11, 1, 
49, 23] of the form PKP T = LDL T where P is a permutation matrix, L is unit lower 
triangular, and D is a block-diagonal matrix with lxl and 2x2 blocks; see [49, Chapter 
11]. By Sylvester's Law of Inertia, inertia(K) = inertia(D) [44, pg. 403] and computing the 
inertia of the computed D is efficient and exact even in inexact arithmetic; see [49, Problem 
11.2]. Indeed, symmetric-indefinite codes [23, 77] already have options to compute inertia, 
and the matrix inertia is used in some optimization solvers to promote global convergence 
[38, 15, 78]. However accuracy of the Inertia test in inexact arithmetic depends on whether 
the inertia of the D computed using finite-precision arithmetic actually equals the inertia 
of K. The Inertia test requires factoring a single (N + M) x {N + M) matrix and may thus 
be applicable to large scale problems only when there is significant sparsity. There are, 
however, extremely efficient codes for forming this factorization. Like the BHT, it is not 
obvious how to compute a direction of negative curvature from K when the Inertia Test 
fails. 

5. EXAMPLES 

This section presents several computational examples. Sections 5.3 and 5.4 illustrate the 
relative performance characteristics of the different algorithms discussed above on dense 
test problems for which H, A, and the truth of H >-q is known explicitly. Section 5.5 
demonstrates the advantages of the Hessian- free properties of Cholesky and Diagonalization 
using a test problem from the COPS collection [22]. 

5.1. Computational Details. Algs. 2, 4, 5, the updated LU BHT, and the Inertia test 
have been implemented in C making extensive use of BLAS [9] routines and LAPACK's routines 
for QR (dgeqrf /dormqr), LU (dgetrf), and symmetric-indefinite block LDL (dsytrf) 
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factorizations [56]. The basis W of C used in Algorithms 2 and 4 is computed using a 
QR factorization of A T , primarily for consistency with the updated QR approach to the 
PCG algorithm. Using the SVD would probably provide more stable (though also more 
expensive) computations of W. The updated LU BHT described above was verified using 
a "naive" BHT that computes sign(det(Bj)) via L independent LU factorizations. The 
time savings from updating LU factorizations ranged from 10-90% depending on N and 
M. All computations were undertaken on an Apple MacPro tower with dual quad-core 
2.26 GHz "Nehalem" processors (each with an 8 MB cache) and 32 GB of RAM running 
Mac OS X 10.6.6 and Apple's implementations of BLAS and LAPACK. 

5.2. Generating Dense Random Test Problems. Random test problems were ob- 
tained as follows: Let N G {2, 3, . . . }, M G {1, . . . , N - 1}, and P G {0, . . . , N}. Choose 
symmetric positive-definite A + G M PxP , symmetric negative-definite A_ G ]&( N - p ) x ( N - p ) ; 
orthogonal Q G ~K NxN , upper-triangular R G M. , and set 



(7) H = Q 



A+ 
A 



Q T and A = Q 



Whether H >-q for such H and A is known analytically: 



Lemma 8. Let H and A be defined as in Eqn. (7) and let C = null(A). (i) H >-q 
if, and only if, L = N — M < P. (ii) For uniformly drawn M G {1, ... ,N — 1} and 
P G {0, . . . , N}, P(H y c 0) = (A + 2)/(2(A + 1)). 

The techniques described by Higham [49, pg, 517-518] are used to draw random orthogo- 
nal matrices (Q) and random symmetric matrices with known eigenvalues (A + , A_). Note 
that any such construction is likely to be dense, and thus does not take advantage of the 
Hessian-free characteristics of the Cholesky, Diagonalization, or PCG algorithms. 

5.3. A Well-Conditioned Example. 

Example 7. Let A be defined by an R with off diagonal elements r^j, j > i, drawn from a 
standard normal distribution and diagonal elements r% % i drawn from a normal distribution 
with mean zero and variance (M — i) 2 . ■ 

Over 20,000 numerical tests were run for Ex. 7. 90 distinct values of N were drawn 
from {10, 5000} and, for each N, at least 200 M, P pairs were drawn. Slightly more 
than half (~ 50.3%) of these trials have H >-q 0, with H )fc in the remaining trials; see 
Lemma S (ii) . Every Hessian matrix H drawn has eigenvalues A satisfying 0.1 < |A| < 100. 
The tolerance for convergence in the PCG approach (Alg. 5) was 10~ 10 . 

In Ex. 7, all methods correctly verified or rejected H >-q in most tests. No method 
returned a false positive in any trial, and only the Inertia test returned false negatives. 
Indeed, the performance of the Inertia test degrades when H >-q as N grows, as shown 
in Table 5.3. For N ~ 5, 000, more than 3% of the SOSC checks using the Inertia test gives 
a false negative. No other algorithm returned a single false negative. 

Figs. 4 and 5 compare the time required by each method to computationally verify 
H for Ex. 7. The Inertia test, generally the fastest method, is used as a benchmark 
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Table 2. Percent of trials in Example 7 for which H >~c yet the Inertia 
Test incorrectly declares H 0; i.e. false negatives (FN). 



N 


77 


242 


478 


679 


899 


908 


1153 


1161 


FN 


1.0% 


0.5% 


2.0% 


1.1% 


1.0% 


1.0% 


1.0% 


2.1% 


N 


1530 


1538 


1718 


2306 


2460 


3687 


4637 


4903 


FN 


1.0% 


1.0% 


2.9% 


2.0% 


1.1% 


2.7% 


1.9% 


3.3% 



Table 3. Percent of trials in Example 7 for which H )~c an d PCG was 
continued at least once. 

All TV TV > 20 N > 50 N > 100 N>1, 000 
~~68% 76% 88% 94% 99% 



for comparison; see Fig. 4. Cholesky (Alg. 2) and Diagonalization (Alg. 4) are as fast or 
faster than the Inertia test for highly constrained problems with M > 0.75N. However if M 
is small relative to N, Algs. 2 and 4 can take 10-20 times longer than the Inertia test. Alg. 
4 is slightly slower than Alg. 2 in part because the Hessian-vector products Hw are done 
one-by-one in Alg. 4, instead of "all-at-once" in Alg. 2. When H is known the level-3 BLAS 
routines used to form HW optimize efficient cache memory usage. Generally speaking 
these results are more encouraging than they appear: dsytrf, the code for the Inertia 
test, is highly optimized while the implementations of Algs. 2 and 4 have not yet been 
optimized. "Blocked" variants of Algs. 2 and 4 that fully exploit memory traffic efficiencies 
in the level-3 BLAS are conceptually easy to derive, and will be even more competitive with 
the Inertia test from the perspective of compute time. 

The PCG approach (Alg. 5) and the BHT are as fast or faster than the Inertia test only 
for M ~ N, and can take more than 100 times longer than the Inertia test on problems 
with N > 1000 and small M. Though it is not shown in the plots, continuation is an 
important part of the PCG approach; see Table 5.3. Alg. 5 converged at least once in over 
68% of the trials for which H >~c 0, and would thus "fail" in more than 68% of our cases if 
it were not continued. Moreover as N grows, Alg. 5 tends to converge more often. Based 
on the trials undertaken for Ex. 7, we would "expect" Alg. 5 to converge in ~ _/\r ' 85 cases. 
Specifically, Alg. 5 converged no more than N - 9 times as JV grew in the trials undertaken 
for Ex. 7, and converged at least ~ N ' 8 times in more than 50% of the trials. While these 
predictions should not be extrapolated beyond this example, they clearly demonstrate the 
necessity of continuation in the PCG approach to verifying the SOSC. 

5.4. A Poorly Conditioned Example. 

Example 8. Let A be defined by an R with all elements drawn from a standard normal 
distribution. ■ 
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Compute Time in the 
Inertia Test (seconds) 

100 
1 

0.01 
0.0001 

10 100 1000 
Problem Size (N) 

Figure 4. Compute times for the Inertia test applied in Example 7 in trials 
for which H >~c is true. 

Compute Time Relative to the Inertia Test 



\ Cholesky 

^ — r " v 


: Diagonalization 


EPCG M *s** ,f * m ~ 













5 10 100 1000 10 100 1000 10 100 1000 10 100 1000 



Problem Size {N) Problem Size (N) Problem Size (JV) Problem Size (N) 

Figure 5. Compute times for Example 7 in trials for which H >~c is 
true, relative to the Inertia test. Solid black lines give the maximum and 
minimum ratios or times for a given N over the values of M tested. Thin 
grey lines in the four relative plots represent the maximum ratio of times 
for subsets of trials such that M > 0.25N (lightest), M > 0.5N (midtone), 
and M > 0.75iV (darkest). 

Again, over 20,000 numerical tests were run for Ex. 8 with the same character as for 
Ex. 7, discussed above. 

Ex. 8 demonstrates that numerical accuracy is far from guaranteed for computational 
SOSC checks. Fig. 6 illustrates the fraction of correct tests results and thus also of false 
negatives. The Cholesky, Diagonalization, PCG, and BHT tests appear more stable than 
the Inertia test. However all methods have a false negative rate close to 70% for problems 
with as few as N ~ 100 variables. With iV ~ 1, 000 variables, each test is so overcome by 
roundoff error that virtually no correct results are obtained. As with Ex. 7, there were no 
false positives. 

The relatively poorer performance of the Inertia test in both examples deserves some 
discussion. Weyl's Eigenvalue Pertubation Theorem provides one way to control numerical 
error in the Inertia test: 
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Problem Size (N) 

Figure 6. Fraction of correct test results (left axis) and, symmetrically, 
false negatives (right axis), over all trials in Example 8 for which H >~c 0. 

Theorem 2. (WeyVs Theorem [25, Theorem 4.1],) If X\ > ••• > A n and fix > • • ■ > fi n are 

the eigenvalues o/X and X + E, respectively, for any n x n symmetric matrix X and any 
n x n matrix E, then \\ — \n\ < ||E||2 for all i. 

Corollary 1. Let P(K + E)P T = LDL T be the computed LDL factorization o/K where 
errors are accumulated into a perturbation E o/K; see [49, Chapter 11]. If |/i| min = 
min{|//| : /i G A(K + E)} > ||E||2 then inertia(K) =inertia(D). 

Proof. The Weyl Theorem states that Aj = m + £i for some \ei\ < ||E||2 for all i. If 
\fii\ > ||E||2 for all i, then also sign{Aj} = sign{//j}. Hence inertia(K) = inertia(K + E) = 
inertia(D). □ 

If, however, |^| min < ||E||2 then inertia(K + E) = inertia(D) may not be equal to 
inertia(K). That is, if the magnitude of any eigenvalue of K + E is less than the norm of 
the accumulated round-off errors, this eigenvalue of the perturbed KKT matrix may have 
the wrong sign relative to the true eigenvalue. As N + M grows, the likelihood of both a 
small |/^| min and large ||E||2 increases. 

Note that K can have small eigenvalues even if H does not. Specifically, K has small 
eigenvalues whenever A has nearly linearly dependent columns, as the simple example in 
Appendix B shows. 

Using the "exact" basis Qi : l of C in the Cholesky and Diagonalization tests eliminated 
the false negatives seen in Fig. 6. This suggests that the numerical errors in these two 
tests, as well as perhaps the PCG test, are entirely a consequence of numerical errors 
in the QR factorization of A T . Two remarks along these lines must be made: First, 
using Qi;L is a device of the artificial numerical examples. In practice, some factorization 
must be used compute a basis for C from A. Second, the effect of error in the computed 
basis of C will depend on the spectrum of H. For example if H is positive-definite these 
representational errors in would be irrelevant to the accuracy of the test. On the other 
hand, false negatives can be obtained only if the numerical errors in the representation of C 
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magnify the contribution of H's negative eigenvalues to the quadratic forms. Understanding 
when the construction of a basis of C is sufficiently accurate relative to H will be essential 
to understanding when SOSC checks using the Cholesky and Diagonalization approaches 
are themselves numerically accurate. 

Unfortunately there will also be numerical errors associated with the Cholesky and 
Diagonalization approaches themselves for some Hessians H, independent of what errors 
are made in the construction of a representation of C. What specific properties of H to 
monitor and control in this respect are not yet known and will be the subject of future 
investigations. 

5.5. A Hessian-Free Example from the COPS Collection. The "Thomson Problem" 
of finding the minimal energy configuration of K E {2, 3, . . . } points on a sphere is a a 
large-scale equality-constrained optimization problem from the COPS collection [22]: 



minimize 

(8) 



K-l K 
k=l m=k+l 

with respect to xi , . . . , xx € M 3 



f(xi,...,x K ) = ^2 



1 



I Xfc X m 1 1 2 



subject to ||xfc||| = 1 for all nE{l,...,K} 

Prob. (8) depends only on the Euclidean norms of K vectors in IR 3 , which are invariant over 
orthogonal transformations of M 3 . Thus, Prob. (8) as stated has infinitely many solutions: 
specifically, if (xi, . . . ,x^) is a solution then so is (Qx 1; . . . , Qx^) for each rotation or 
reflection Q of M 3 . An "orthogonally-invariant" Thomson problem can be obtained by 
restricting the locations of the first and second points in a manner inspired by Householder 
QR factorization: 



minimize /(xi, . . . , x^-) 



K-l K | 

^""j ^7^, i 1 1 x fc x m 1 1 2 

k=l m=k+l 



(9) with respect to xi, . . . , x# € M 3 

subject to ||xfc||!/2 = 1/2 for all nE{l,...,K} 

351,2 = 351,3 = ^2,3 = 

Problem (9) has N = 3K variables and M = K + 3 constraints. 

Prob. (9) was solved for specific values of K between 3 and 333 using matlab's Hessian- 
free interior-point algorithm. The SOSC was then verified at the computed (xi, . . . , x#) 
and A using our C implementations of Cholesky, Diagonalization, and the Inertia test. The 
Hessian-free nature of the Cholesky and Diagonalization algorithms was exploited by using 
directional finite differences to approximate Hessian-vector products Hs. For the Inertia 
test the full Hessian H was approximated with finite-differences. The PCG and BHT 
algorithms were not used; the results in Section 5.3 suggest they are not competitive. 

In this case the Hessian-free SOSC checks provided by the Cholesky and Diagonalization 
algorithms tended to reduced time to verify the SOSC relative to the Inertia test by just 
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Cholesky • 
Diagonalization • 

600 800 1000 

3 (AO 

Figure 7. Left: Computed solution to Prob. (9) for K = 100. Right: 
Percent reduction in time to computationally verify H >~c for various 
instances of Prob. (9). When only the black dot is visible, the gray and 
black dots coincide. Note that N = 3K, where K is the number of points 
distributed over the surface of the unit sphere. 

over 20%; see Fig. 7. Recall that the Inertia test was the fastest method on the dense test 
problems above with ~ 30% of the variables constrained. Thus the Hessian-free application 
of the Cholesky and Diagonalization algorithms results in significant computational savings. 

6. CONCLUSIONS 

This article has presented three novel Hessian- free algorithms for verifying (or rejecting) 
the SOSC for constrained continuous optimization. These algorithms also make compu- 
tation of feasible directions of negative curvature easy when the SOSC fails, a feature 
not available in classical tests. Numerical trials have demonstrated (1) the inefficiency of 
the Bordered Hessian Test, (2) the relative speed of the Inertia test, (3) the computa- 
tional efficiency of the new algorithms, especially when their Hessian- free properties can 
be exploited, and (4) the potential for significant loss of accuracy due to round-off error 
using any method, even on small problems. Future work will optimize implementations of 
the new algorithms and undertake a detailed mathematical analysis of round-off errors to 
determine computable certificates of test accuracy. 
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Appendix A. Computing a Basis for C given A 



Both Algorithm 2 and 4 require a basis W for C, rather than the constraint gradients A. 
There are several ways to compute such a basis from A. Implicitly, each method considers 
the equation AW = and uses some factorization of A (or A T ) to find a formula for W. 
Several examples follow: 

• SVD of A: If A = U[ £ ]V T is a full Singular Value Decomposition (SVD; [87, 
Lecture 4]) of A, then W = Vm+1:JV is an orthonormal basis for C. 

• QR factorization of A T : Similarly if A T = QR then W = Qm+i-.n is an orthonor- 
mal basis for C. 

• QR or LU factorization of A: If A = Q[ R S ] for upper-triangular R, or if 
PA = L[ U S ] for upper-triangular U and some permutation matrix P, then 



W 



R X S 
I 



or 



oLxL 



w 



u^s 
I 



are bases for C. In both cases, I G 

Stability considerations typically suggest that either of the first two approaches are prefer- 
able to the third. 



Appendix B. K has Small Eigenvalues When A is Nearly Rank Deficient 

Let H € R NxN and A' € E( M - 1 ) xiV be arbitrary (but full-rank). Let a m G R N denote 
the m th row of A', for m G {!,...,¥- 1}. Choose Pi,..., Pm-i (not all zero), e € 1^ 
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with small 2-norm, and set &m = J2m=i fim&m + £• The /3's and e can be chosen so that 
1 1 ajvf 1 1 2 = 1, even if ||e||2 is very small. Define 
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noting that K + F is singular. Specifically, 
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Thus, K + F has a zero eigenvalue and Weyl's Theorem states that K has an eigenvalue A 
with |A| < ||F|| 2 = ||e||2- 
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